Abstract
BACKGROUND Accurate characterization of hematologic malignancy is a first step in correct and tailored treatment. Tumor whole transcriptome sequencing (WTS) has recently emerged as a universal technique allowing for the accurate identification of not only all possible fusion transcripts, but also point mutations, copy number alterations, and gene overexpression, being especially useful for diagnosing cases of B-cell acute lymphoblastic leukemias (ALL). Since WTS provides gene expression landscape of the tumor sample, we decided to investigate whether the expression signatures could be used to also accurately classify a full spectrum of pediatric hematologic malignancies.
METHODS We selected hematologic disorders with a minimum of 3 samples per condition in St. Jude Children's Research Hospital and Princess Maxima Center for Pediatric Oncology repositories. In total, this yielded WTS count data for 2914 samples across 69 distinct hematologic malignancy subtypes. The conditions spanned acute myeloid and lymphoid leukemias, Hodgkin and non-Hodgkin lymphomas, myelodysplastic syndrome, chronic myeloid leukemia, and others, including rare entities such as acute myeloid leukemia with KMT2A-partial tandem duplication or T-ALL with TLX1 activation.
Raw gene counts of WTS data were used as model inputs. To focus on relevant genes in tumorigenesis, a list of 1245 known tumor drivers was selected based on OncoKB and CosmicDB. To improve classifier performance, gene counts were enriched with statistical features (total sum, mean, kurtosis and skew). Different machine learning models (linear regression, random forest, support vector classifier, multilayer perceptron and 3 types of gradient boosting models) were trained using a 3-fold cross-validation (due to a minimum of 3 samples per condition). F1 score (harmonic mean of precision and recall) was used as a primary metric to account for the imbalance of different hematologic conditions. Finally, external model validation was performed on an independent dataset of 28 hematological malignancy samples of 9 conditions from Children's Clinical University Hospital, Riga, Latvia.
RESULTS The best performing model was a voting ensemble combining 7 different classifiers with internal cross-validation accuracy of 0.82 and an F1 score of 0.59. Performance on the independent test set was similar with overall accuracy of 0.82 and F1 score of 0.53. WTS in the test set identified all fusions that were also detected by the standard method. The model additionally detected clinically relevant events such as a case of B-ALL with PAX5-JAK2 fusion which was missed by the standard method.
Conditions having similar expression patterns such B-ALL with ETV6-RUNX1 fusion and B-ALL ETV6-RUNX1-like were the most challenging to distinguish. Notably, acute megakaryoblastic leukemia cases, characterized by HOXA9 gene rearrangements, were classified as KMT2A-rearranged AML likely due to the shared transcriptional activation of HOXA9.
CONCLUSION We propose a method to identify hematologic malignancy based on WTS gene counts. This approach enables the classification of diverse disease subtypes without the need for direct fusion detection and can complement existing diagnostic modalities.
Finally, the classifier results also reveal transcriptional convergence between certain disease subtypes such as the KMT2A- and HOXA9-driven leukemia which may represent a potential shared target with emerging compounds such as menin inhibitors.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal